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Abstract —Influential users have great potential for accelerating 
information dissemination and acquisition on Twitter. How to 
measure the influence of Twitter users has attracted significant 
academic and industrial attention. Existing influence measure¬ 
ment techniques are vulnerable to sybil users that are thriving 
on Twitter. Although sybil defenses for online social networks 
have been extensively investigated, they commonly assume unique 
mappings from human-established trust relationships to online 
social associations and thus do not apply to Twitter where users 
can freely follow each other. This paper presents TrueTop, the 
first sybil-resilient system to measure the influence of Twitter 
users. TrueTop is rooted in two observations from real Twitter 
datasets. First, although non-sybil users may incautiously follow 
strangers, they tend to be more careful and selective in retweeting, 
replying to, and mentioning other users. Second, influential users 
usually get much more retweets, replies, and mentions than non- 
influential users. Detailed theoretical studies and synthetic simu¬ 
lations show that TrueTop can generate very accurate influence 
measurement results with strong resilience to sybil attacks. 

Keywords—Influence measurement, social networks, Twitter, 
sybil resilience. 

1. Introduction 

WITTER has become a powerful vehicle for large-scale 
information dissemination. As of May 2014, Twitter has 
255 million monthly active users and 500 million daily tweets. 
This massive base of active users has triggered explosive uses 
of Twitter in marketing, journalism, public relations, massive 
information campaigns, entertainment, and during events of 
worldwide and national significance. 

Infiuential Twitter users have great potential for accelerating 
information dissemination and acquisition. For example, to 
launch a viral marketing campaign for a new product via 
Twitter, a known strategy is for the marketer to seed the prod¬ 
uct with a few selected infiuential users who can potentially 
infiuence a disproportionately large number of others and also 
quickly trigger a cascade of infiuence. As another example, 
in the event of a national crisis, the governmental authority 
can conduct a massive information campaign by disseminating 
truthful information via infiuential users to effectively achieve 
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Strategic goals and also counteract rumors. As the last example, 
to have realtime situational awareness about a physical region 
of interest, military agencies can recruit volunteers in the target 
region via infiuential Twitter users there and then outsource the 
collection of in-situ information to the volunteers. 

The strong promise of infiuential users leads to the growing 
attention on how to measure the infiuence of a Twitter user 
0 - 0 - There are also over 20 commercial tools available 
for measuring twitterers’ online infiuence. Common to these 
research proposals 0 0 and commercial tools is to capture 
the qualitative feature of online infiuence as “the ability to 
cause effect, change behavior, and drive measurable outcomes 
online” and to quantify a twitterer’s online infiuence based 
on his/her interactions with others. 

The rise of social bots or sybils ||7| in general on Twitter 
is jeopardizing trustworthy infiuence measurement. In a sybil 
attack, the adversary coordinates many fake accounts (also 
called bots or sybil users hereafter) to unfairly overpower 
non-sybil users. Despite various efforts to detect sybil users 
on Twitter (8)-(T3), sybil users are still thriving on Twitter. 
For example, a recent study (E) revealed that at least 10% 
of Twitter users are sybil users. Given the exclusive reliance 
of existing infiuence measurement techniques on user interac¬ 
tions, the adversary could coordinate his sybil users to create 
arbitrary interactions to infiate their infiuence scores on Twitter. 
Since infiuence scores are relatively defined, the adversary 
could also effectively deflate the infiuence scores of non-sybil 
Twitter users. According to our recent study (T5), an adversary 
controlling 1,000 sybil users can quickly generate an infiuence 
score in the 95th percentile for any sybil user under popular 
infiuence measurement tools such as Klo ut fTEl , Kred 03 
and Retweet Rank fT^ . In a similar study ||19|, Messias et al 
used two social bots to successfully obtain high Klout scores. 

The lack of sybil-resilient infiuence measurement services 
on Twitter can be detrimental. Specifically, there is a growing 
market for infiuence measurement services with more than 
20 service providers available 0. If these service providers 
fail to provide trustworthy measurement results due to sybil 
attacks, they will have extreme difficulty getting customers 
and surviving, and their customers could not achieve effective 
information dissemination or acquisition as expected. 

The root cause for the vulnerability of existing infiuence 
measurement techniques to sybil attacks lies in the incautious 
use of user interactions. Specifically, Twitter permits four types 
of publicly visible user interactions, including follow, retweet, 
reply, and mention. The interactions about any user can be 
further classified into incoming interactions towards him and 




outgoing interactions from him. Since a sybil user can freely 
follow, retweet, reply to, and mention other sybil or non-sybil 
users, extensive outgoing interactions are fairly easy to create 
and thus unsuitable for sybil-resilient influence measurement. 
In addition, since sybil users could easily get many legitimate 
followers [[^-||22l, the number of followers each user has 
should also be ruled out. In contrast, we observe from real 
Twitter data that non-sybil users tend to be more selective 
in retweeting, replying to, and mentioning other users. This 
observation is in line with the real-life scenario: one may 
exchange business cards with many strangers but will be more 
cautious in choosing whom to further interact with. This means 
that incoming retweets, replies, and mentions are much more 
trustworthy information for measuring user influence. Existing 
influence measurement techniques, however, use all incoming 
and outgoing interactions in a non-discriminative way. 

We propose TrueTop, a novel sybil-resilient influence mea¬ 
surement system based on the incoming retweets, replies, and 
mentions each Twitter user has. TrueTop provides on-demand 
influence measurement services to various customers such as 
business companies and government agencies. Given a target 
set of Twitter users (e.g., those in a geographic area of interest), 
TrueTop outputs a ranked list of top-iT influential users for a 
desirable integer K > 1. TrueTop is designed to be syhil- 
resilient and also accurate, which means that the TrueTop 
output contains bounded sybil users and the true top-K non¬ 
sybil users with overwhelming probability, respectively. 

The main design challenge for TrueTop is that sybil users 
can arbitrarily interact among themselves, so it is not sybil- 
resilient to evaluate a user’s influence directly based on his 
total incoming retweets, replies, and mentions. We propose 
the following method to tackle this challenge. Given the target 
set of users, we first construct a weighted directed interaction 
graph, in which every vertex corresponds to a unique user 
in the target set. An edge from vertex a to vertex b exists 
if user a has ever retweeted, replied to, or mentioned user b, 
and the edge weight is proportional to the number of retweets, 
replies, and mentions from a to b. Imagine that the interaction 
graph consists of a virtual non-sybil region with all non-sybil 
users and a virtual sybil region with all sybil users. Given our 
previous observations, both the number of edges and the total 
edge weights from the non-sybil region to the sybil region 
should be much smaller than those in the reverse direction. 
Then we seed some carefully chosen vertices (or users) in the 
non-sybil region with some credits and let every vertex in the 
whole graph allocate its current credits to its direct successors 
proportionally to the corresponding edge weights in every 
iteration. After sufficient iterations, the top-AT influential non¬ 
sybil users are very likely to stand out, as they can accumulate 
many credits due to their abundant incoming retweets, replies, 
and mentions. In contrast, the total credits flowing into the 
sybil region can be very limited, so even the sybil users with 
many incoming interactions from sybil followers may end up 
with few credits. We can thus achieve sybil-resilient influence 
measurement by counting the final credits at every vertex. 

This paper makes the following contributions. 

• We motivate and formulate the problem of sybil-resilient 
influence measurements on Twitter. 


• We propose TrueTop, a novel influence measurement 
system that can identify the top-AT influential users in 
a target set of Twitter users with high accuracy in the 
presence of sybil attacks by exploiting the selectivity of 
non-sybil users in interacting with other users. 

• We confirm the high accuracy and sybil-resilience of 
TrueTop by detailed theoretical analysis and extensive 
experiments on real datasets. 

The rest of this paper is organized as follows. Section [n| 
surveys the related work. Section [nl| introduces Twitter basics, 
our system and threat models, and our design objectives. 
Section |IV] illustrates the TrueTop design. Section |V| theoret¬ 
ically analyzes the accuracy and sybil resilience of TrueTop. 
Section |V^ evaluates the performance of TrueTop by detailed 
experiments. Section |VII| concludes the paper. 

II. Related Work 

There is significant effort to explore social networks for 
effective sybil defenses in various distributed systems, such as 
SybilGuard and SybilLimit p4| for P2P networks, SumlJp 
| [25| for online voting systems, and Sybilinfer | [26| , SybilDe- 
fense p7| , and SybilRank for online social networks. A 
common assumption is that each node can be mapped into 
one in an undirected social network graph where every edge 
corresponds to a human-established trust relation. Although the 
attacker can create many sybil accounts, he cannot establish an 
arbitrarily large number of social trust relations with non-sybil 
users. Moreover, all schemes assume that the honest region is 
fast mixing and separate from the sybil region. Built upon these 
two key insights, these schemes conduct varying community 
detection methods to limit the number of sybil users 
admitted into or their impact in various application scenarios. 

Recent measurement studies have questioned these two 
assumptions. Yang et al | [30| showed that sybil users on the 
Facebook-like Renren network can have their friend requests 
accepted by many non-sybil users. A similar result targeting 
Facebook was reported in (H) Blending sybil users into the 
non-sybil community would reduce the effectiveness of the 
existing sybil defenses In addition, the work in j^, p~5| , 
P^ , p0| , showed that sybil users successfully acquired 
a number of followings from non-sybil users on Twitter. All 
these findings indicate that neither bidirectional friendships in 
Fackbook-like OSNs nor unidirectional followings in Twitter¬ 
like microblogging systems can be used as the trustable mir¬ 
roring of real social relations. Moreover, it has been shown in 
p4| , | [35| that the mixing time of many practical and directed 
social graphs is much longer than previously expected. Since 
neither of the two key assumptions underlying the schemes in 
|[23|-p9| holds in directed networks such as Twitter, they are 
not directly applicable to our targeted scenario. Our TrueTop 
system does not rely on either assumption. 

As a special kind of sybil users, spammers in Twitter has 
attracted considerable attention in recent years. A common 
approach adopted by existing work ©-in), (36), (^ is 
to detect spammers by measuring the behavioral difference 
between spammers and legitimate users. Spammers are a 
special type of sybil users, and the detection of general sybil 
users on Twitter remains an open challenge. 
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There is a rich literature for influence measurement on 
Twitter. Cha et al. (l) found that the numbers of retweets and 
mentions serve as better metrics than the number of followers 
in measuring user influence. Bakshy et al 0 proposed to 
measure user influence based on his ability to post the tweets 
that generates a cascade of retweets. TwitterRank 0 combines 
link structure and topical similarity between Twitter users 
and uses a modifled PageRank algorithm to calculate user 
influence. Pal and Counts p8| also proposed a framework 
to identify topical authorities in microblogging systems. All 
these schemes are vulnerable to sybil users who can forge 
arbitrary information employed by these schemes for influence 
measurement. Moreover, many metrics used by these schemes 
have been incorporated into commercial influence measure¬ 
ment tools 0, and the vulnerability of representative tools to 
sybil attacks has been experimentally verifled in d). 

Also related is the research on modelling, measuring, and 
analyzing the interactions in OSNs, e.g., |[^-l[43|. Our work 
is the first to build a weighted directed interaction graph from 
historical incoming retweets, replies, and mentions on Twitter 
and use it for identifying influential users. 

III. Preliminaries 

A. Twitter Basics 

We illustrate the basic operations on Twitter to help un¬ 
derstand our design. The social relationships on Twitter are 
unidirectional by following others. If user A follows user 

B, A is B's follower, and B is A’s followee. A user usually 
needs no prior consent from his followees. Twitter also allows 
each user to approve/deny every following request, but this 
option is relatively rarely used. A user can send text-based 
messages of up to 140 characters, known as tweets, which 
can be read by all his followers. Tweets can be visible to 
anyone with or without a Twitter account, and they can also be 
protected and are only visible to approved followers. There are 
three special kinds of tweets corresponding to three operations. 
A retweet is a re-posting of someone else’s tweet, a reply 
corresponds to a response to a tweet, and a mention refers to 
inserting “©username” in a tweet to ensure that the specified 
user can see this tweet. Finally, each user has a timeline which 
shows all the latest tweets (including original tweets, retweets, 
replies, and mentions) of his followees. Also note that Twitter 
allows direct messages to be sent between users. Since those 
direct messages are not publicly visible, they cannot be used 
to measure user influence. 

B. System Model 

TrueTop is run by a service provider (SP) offering on- 
demand influence measurement services to customers such as 
viral marketers, government/military agencies, or even indi¬ 
viduals. Given a measurement request, the TrueTop SP first 
determines the target set of Twitter users to evaluate, denoted 
by U. The users in U can be directly given by the customer 
or identified by the TrueTop SP according to some common 
features specified by the customer. For example, the customer 
can specify a target geographic region, a target age group. 


a target topic (e.g., music), etc. As said, TrueTop relies on 
incoming interactions among the users in U, i.e., the retweets, 
replies, and mentions each user in U has received from all the 
other users mU. So we assume that the SP has a reliable way 
to obtain the incoming interaction data needed, e.g., directly 
from Twitter, via crawling, or from some third-party providers 
of social media data. For example, Gnip (http://gnip.com/) 
is an authorized reseller of Twitter data. TrueTop is designed 
to output a ranked list of top-Ff influential users in U, where 
K >1 denotes a customer-specified integer. 

C. Threat Model 

Let hi denote all possible sybil users in U. We assume that 
the SP knows neither which user in /// is a sybil user nor 
how many sybil users there are; otherwise, the identified sybil 
users can be simply removed from U. Based on the recent 
measurement study l^, we assume that each sybil user may 
have followed and also been followed by some non-sybil and 
sybil users in U. There may be a single attacker controlling 
U or multiple independent ones with each controlling an 
exclusive subset of U. TrueTop can deal with both cases 
without modification, so we focus on the more challenging 
former case hereafter. The goal of the attacker is to gain high 
influence scores for his sybil users and maximize the number 
of users in the TrueTop output. 

D. Design Objectives 

Let and Uk denote the top-iT non-sybil influential users 
in U and the TrueTop output, respectively. We have two major 
design objectives. 

• Accuracy: TrueTop should identify the true iop-K non¬ 
sybil users, which means the difference between and 
Uk should be very small. 

• Sybil resilience: TrueTop should not identify sybil users 
as top-AT users, i.e., the the intersection Uk ^U should 
be very small. 

IV. TrueTop design 

A. Overview 

TrueTop is motivated by the observation that incoming 
retweets, replies, and mentions are more trustworthy for mea¬ 
suring user influence than outgoing interactions. So our first 
step is to construct an interaction graph, in which every vertex 
corresponds to a unique user in the target set U, and every 
directed edge indicates totally non-zero retweets, replies, and 
mentions from the tail user to the head user. In addition, the 
weight of every edge is a non-decreasing function of related 
retweets, replies, and mentions. 

The next step is to choose a suitable metric to quantify 
the influence of every user (vertex) in the interaction graph. 
TrueTop adopts weighted eigenvector centrality (WEC for 
short) | [44| , the de facto metric for measuring the influence 
of a node in a weighted directed graph. Specifically, the WEC 
score of every user corresponds to his influence score, which 
depends on the weights of his incoming edges, the number 
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of his direct predecessors, and their influence scores which 
are further determined by their respective incoming edges and 
direct predecessors. The WEC score reflects an intuition that 
the influence of a user is better indicated by the interactions 
from influential users than those from less influential users. 

We uses iterative credit distribution for the convenience to 
describe and understand our method. Speciflcally, we select 
some random users (called seeds) in the interaction graph and 
seed each with some credits. In each iteration, we allocate 
all the credits each user receives in the last iteration to his 
direct successors proportionally to individual edge weights. 
The credits each user receives in one iteration are expected 
to stabilize after sufficient iterations and be proportional to 
his WEC score. It can be easily shown that iterative credit 
distribution is equivalent to power iteration | [45| , a standard 
technique for computing WEC scores. Since sybil users can 
create arbitrary interactions among themselves, some of them 
may gain enough credits to appear in the top-if list. TrueTop 
achieves high sybil resilience by carefully choosing the initial 
seeds and also early terminating iterative credit distribution. 

In what follows, we first illustrate the construction of 
the interaction graph in Section IV-B Next, we present an 
iterative cr edit d istribution scheme over the interaction graph 
in Section IV-C| Einally, we introduce how to achieve sybil- 
resilient iterative credit distribution in Section IIV-DI 


B. Interaction Graph Construction 

Given the target users U and their interaction data, TrueTop 
first builds a weighted directed interaction graph denoted by 
Q = V), where U is abused to denote the vertex set, and 

every edge Vij G V (i, j ^ U) is directed and indicates that 
there are some retweets, replies, and/or mentions from user i 
to j. The major challenge here is to determine the weight Wij 
of every edge Vi^j. As shown in Eig.[^ Q can be divided into a 
virtual sybil region S including all the sybil users and a virtual 
non-sybil region H including all the non-sybil users. The sybil- 
resilience requirement for TrueTop requires that the sum of the 
edge weights from the non-sybil region to the sybil region is 
small, while the accuracy requirement for TrueTop demands 
that the weight Wij reflects the true influence of user j on i 
in the target period. Let lij denote the set of time-indexed 
retweets, replies, and mentions from user i to j. We consider 
the following two methods for defining the edge weights. 

• Sum-based. In this method, Wij equals \Tij\. Sum-based 
edge weights satisfy the sybil-resilient requirement, as 
the total edge weights from the non-sybil region to the 
sybil region are as limited as the number of retweets, 
replies, and mentions from non-sybil users to sybil users. 
They also partially satisfy the accuracy requirement, as 
the more interactions from i to j, the more influence 
j likely has on i, and the higher Wij. Sum-based 
edge weights, however, fail to catch the temporal aspect 
of interactions. Eor example, consider another direct 
predecessor of j, say /, where = ihji Assume 

that the interactions in Iij occurred in the last few 
days in the target period, while those in lij were 
spread more evenly. It may be natural to say that j has 


stronger influence on user i than on user /, but we have 
Wij = wij for sum-based methods. 

• Entropy-based. In this method, we divide the target 
period into p equal-length epochs for some system pa¬ 
rameter /i > 1 and denote the total number of retweets, 
replies, and mentions from user i to j in epoch xth by 
dx, where \Tij\ = Then we define the edge 

weight Wij = (1-E'^=i 1 ^ log |^)|Xi.i|. The more 
consistent the interactions from i to j in time, the higher 
Wij, and vice versa. When all the interactions happen 
in a single epoch, the weight is identical to sum-based 
\Tij\. Entropy-based edge weights can also satisfy the 
sybil-resilience requirement, as non-sybil users unlikely 
have consistent interactions to sybil users so that the 
total edge weight from the non-sybil region to the sybil 
region can be expectedly small. In contrast to sum-based 
edge weights, entropy-based edge weights successfully 
catch the temporal information in the interactions while 
failing to reflect the volume of the interactions. So they 
partially satisfy the accuracy requirement as well. 

The effects of the above methods are compared in Section [Vl| 
There may be other ways to define the edge weights. Eor 
example, we can let Wij equal a linear combination of the 
edge weights derived under sum-based and entropy methods, 
respectively; we can also assign different weights to retweets, 
replies, and mentions according to slightly different effort 
and/or social implication related to performing these interac¬ 
tions. A further study on such issues is left as future work due 
to space constraints. 

Note that we only consider retweets, replies, and mentions 
in the weight definitions because they are representative on 
Twitter and have been used in all the existing influence 
measurement techniques. Some other factors could also impact 
the user influence, such as following connections and favorites. 
As stated before, since sybil users could easily get many 
legitimate followers pO|-p^, the following connections fail 
to achieve the sybil resilience and hence should be ruled out 
for the influence measurement. On Twitter, a user could favor 
the tweets from other users, but there is no public Twitter API 
which can return the favorite user list for any given tweet. 
Should a public Twitter API for retrieving favorites become 
available, we can easily incorporate favorites into TrueTop. 


C. Credit Distribution 

TrueTop uses the WEC score of every user in ^ = (/^, V) as 
his influence score. Speciflcally, let tt^ denote the WEC score 
of user iinQ and W = {wi^j) denote the normalized weighted 
adjacency matrix of Q. The vector tt = (tti, 7r2,. •., 'K\u\) is the 
dominant eigenvector of W, i.e., the solution to the equation 
ttW = TT according to | [44| . 

Power iteration [ [45| is a common technique to compute the 
WEC vector tt. Let vq be a random vector composed of \U\ 
nonnegative elements totalling one. In power iteration, tt is 
computed in an iterative fashion as 

TT = lim = lim VoW* , (1) 

t^OO t^OO 
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where with the initial x*^^^ = vq. If ^ 

is strongly connected, tt exists, is unique, and is unrelated 
to vq. In practice, power iteration normally terminates if 
||x(^) — < V for some acceptable error threshold v 

(e.g., 10“^). 

The WEC vector only exists in a strongly connected graph 
| [44] , in which every vertex is reachable from every other 
vertex. Although Q itself may be not strongly connected in 
practice, it usually has a giant strongly connected component 
(GSCC) which includes the majority of the vertexes and edges 
and is dramatically larger than all other strongly connected 
components (SCCs). Since the most influential users should 
have intensive interactions with other users, the top-AT in¬ 
fluential users should be in the GSCC with overwhelming 
probability. Our subsequent operations thus apply to the GSCC 
only. The veriflcation of the existence of GSCC in real datasets 
is deferred to Section ED 

TrueTop uses iterative credit distribution instead to compute 
TT to facilitate the presentation. Initially, we randomly select a 
few users (called seed^) in Q and initialize each with the same 
number of notional credits totalling one. At every iteration, we 
allocate the credits each user receives in the last iteration to 
his direct successors proportionally to the corresponding edge 
weights. Let denote the number of credits at any user 
j after t iterations, which are proportional to his influence 
score measured after t iterations, is a real number in 
general and can be computed as 

cf = 5: —, (2, 

XfceouT(i) 

where IN(j) and OUT(i) denote the direct predecessors of 
user j and the direct successors of user i m Q, respec¬ 
tively. Similarly, we can terminate credit distribution when 
^jeu ~ I < 7^ for some acceptable error threshold 

T] (e.g., 10“^). 

We can easily show that iterative credit distribution above 
is equivalent to power iteration. In particular, assume that s 
seeds are chosen in iterative credit distribution, each having 
1/s credits initially. We further select vq for power iteration 
such that the ith element equals I /5 if user 7 is a seed and 
zero otherwise. Then Eq. is apparently the element-wise 
expression of x^^^ = W. Since power iteration does not 
depend on a speciflc vq, we have for any user 

j after t iterations. 

Iterative credit distribution described above is still subject to 
sybil attacks. To see this, consider Eig.[2 where the interaction 
graph is divided into a virtual non-sybil region H and a virtual 
sybil region S. We denote the total edge weights within H, 
within 5, from 1-L to 5, and from 5 to H by W^, Ws, olWu, 
and pWs, respectively, where a <^1. Although the adversary 
has no control over Wu and a, he can easily manipulate Ws 
and p to make pWs very small. Even if all the seeds are 
chosen from H in the best scenario, more and more credits 
will flow into and stay in S as time goes by. We have the 
following proposition about the vulnerability of iterative credit 
distribution to sybil attacks. 


Non-sybil Region H Sybil Region S 



Eig. 1: The interaction graph with a virtual non-sybil region 
H and a virtual sybil region S. 

Proposition 1. Assume that the total edge weights from the 
non-sybil region 1-L to the sybil region S and from S to H 
are a and (3 fractions of the total edge weights in li and S, 
respectively. The total credits in S increase monotonically with 
the iteration t and asymptotically approach to 

The proof of Proposition can be found in Appendix I- 
A. Since the adversary can well control the topology within 
5, most credits in S can go to a few sybil users who may 
eventually appear in the iop-K influential users. 

D. Sybil-Resilient Credit Distribution 

TrueTop adopts the following two defenses against sybil 
attacks such that most credits can stay in the non-sybil region 
for sufficient iterations. 

The first defense is to use non-sybil seeds only so that credit 
distribution can start from the non-sybil region %. We propose 
to use verified Twitter users as seeds by three reasons. Eirst, 
Twitter has certified their authenticity. Each verified user has 
a blue verified badge on his profile page and is followed by 
the official Twitter account @ verified. Second, there are many 
verified users available as candidate seeds. As of April 2014, 
Twitter has verified more than 88,600 accounts among 255 
million monthly active users and keeps verifying more. Since Q 
can be expected to contain many users in practice, there should 
be at least one verified user in Q with very high probability. 
Einally, since verified users are usually public figures such as 
politicians, celebrities, or business leaders, we can trust them 
to be very cautious in whom to retweet, reply to, and mention. 
This implies that the immediate successors of verified users on 
the interaction graph Q are very likely to be non-sybil users 
as well, so are the successors’ immediate successors. If we 
start credit distribution from verified users, most credits can 
be expected to stay inside TL after many iterations. 

How many seeds should we choose? Some verified users 
may be very close to the sybil region, but we cannot tell who 
they are. Ideally speaking, we should choose the verified users 
far from the sybil region. On the one hand, if a verified user is 
randomly chosen as the sole seed, he may be too close to the 
sybil region. On the other hand, if we use all the verified users 
in Q as the seeds, it is very likely that some of them are close 
to the sybil region. In addition, the number of seeds affects the 
convergence of iterative credit distribution: the more seeds, the 
faster the algorithm converges. It is impossible to specify the 
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decisive rules for seed selection, so we randomly choose s > 1 
seeds from the verified users in Q and experimentally evaluate 
the impact of seed selection in Section |V^ 

How should we assign the initial credits among the s seeds? 
We propose two methods as follows. 

• Basic method. The total credits are evenly assigned to the 
s seeds. This straightforward method assumes that each 
seed has the same importance for credit distribution. 

• Reverse-WEC. Since the credits fiow out from the seeds, 
we can assign more initial credits to the seeds who can 
quickly reach more users to speed up the algorithm 
convergence. For this purpose, we con duct the credit 
distribution introduced in Section IIV-CI over an inverse 
interaction graph generated from Q by reversing the 
directions of all the edges and also setting all the 
edge weights to one. The final credits at each user 
naturally reflects his connectivity in Q. So we select 
the verified users with the top-K highest credits as the 
seeds and then assign to each of them the initial credits 
proportional to their credits obtained via reverse credit 
distribution. 

The second defense is to early terminate iterative credit 
distribution before it converges in the whole graph Q. To see 
the necessity and intuition for this defense, recall that we 
start credit distribution from non-sybil seeds in the non-sybil 
region. Since the total edge weight from the non-sybil region 
to the sybil region is relatively small, we can expect credit 
distribution to converge much faster in the non-sybil region 
than in the whole Q. In addition, the most infiuential non¬ 
sybil users normally have many incoming interactions and thus 
a rich number of credit sources in Q. So they can quickly 
accumulate a lot of credits to stand out much faster than 
other non-sybil users. If we early terminate iterative credit 
distribution, most or all of the sybil users would not get enough 
credits to appear in the resulting top-K infiuential users, so 
we can achieve sybil resilience. However, if credit distribution 
stops too early, some true top-iT infiuential non-sybil users 
may not get enough credits to be ranked in the top-K list, 
leading to an inaccurate result. 

We design a simple but effective algorithm to tackle the 
dilemma between sybil resilience and accuracy. The key idea is 
to monitor the ranking change of the candidate top-K users in 
two consecutive iterations. Whenever the ranking change is no 
larger than an acceptable threshold, we terminate the algorithm 
and output the current top-iT users as the top-iT infiuential 
users. This algorithm is directly built on our observation above. 
Specifically, since the top-K non-sybil infiuential users is more 
likely to stand out much faster than both sybil users and other 
non-sybil users during credit distribution, their rankings are 
more likely to become stable in fewer iterations as well. We 
detail the algorithm as follows and postpone its performance 
analysis to Section fv) 

Let 2 indK^~^\u) denote the rankings of user u 

in iterations t and t — 1, respectively. We define the ranking 
distance d{K)^^^ between and as 

• ( 3 ) 

ue7^(^)(K)U7^(^-l)(K) 


Algorithm 1: Find the top-K infiuential users 

input : Interaction graph Q\ s seed users; K\ maximum 
number of iterations T; ranking-error tolerance e 
output: The top-K infiuential users 
Assign initial credits among s seed users by either basic 
or reverse-WEC method; 
t i — 1; 

while t < T do 

Distribute the credit in the f-th iteration according to 

Eq.g 

RanF the users by their credits and obtain the 
candidate top-AT users 

Compute the ranking distance d{K)^^^ between 
and as in Eq. 

if d{K)^^^ <= e then 
^ break; 

t i — t -f" 1; 

return as the top-K infiuential users 


The algorithm above has two key parameters: T and e. 
The former dictates the maximum number of iterations, and 
the latter specifies the maximum ranking error tolerance. The 
larger T, the longer the algorithm execution time, the more 
accurate the top-AT infiuential users, the more credits flowing 
into the sybil region and thus the less sybil resilience, and 
vice versa. In contrast, the larger e, the shorter the algorithm 
execution time, the less accurate the top- AT infiuential users, 
the fewer credits flowing into the sybil region and thus the 
higher sybil resilience, and vice versa. In practice, we can let 
e < AT, meaning that each user in the current top- AT list has 
experienced a ranking change of less than one on average in 
contrast to the previous iteration. 

V. Performance Analysis 

In this section, we analyze the accuracy and sybil resilience 
of TrueTop. Recall that denotes the true top-A" infiuential 
users in the non-sybil r^ion, Uk denotes the TrueTop output 
(i.e., the output of Alg.[l]), and U denotes all the sybil users in 
the sybil region. So we can use and UkEU to measure 

the accuracy and sybil-resilience of TrueTop, respectively. 

To make the performance analysis tractable, we first assume 
that Alg.[2runs in the non-sybil region only, so we can conduct 
an upper-bound analysis about the accuracy of TrueTop by 
setting the ranking error tolerance parameter e = 0 and T 
extremely large such that Alg.[2 terminates only when a stable 
top-K user list is found. We then show that Alg. will 
terminate in asymptotically the same number of iterations for 
e = 0, based on which we finally estimate the number of 
sybil users appearing in Uk^ As stated before, the larger e, the 
shorter the algorithm execution time, the less accurate the top- 
K infiuential users, the fewer credits flowing into the sybil 
region and thus the higher sybil resilience, and vice versa. 
Hence by setting e = 0, we can provide the lower and upper 
bounds for sybil resilience and accuracy, respectively. As for 
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arbitrary e > 0, we unfortunately cannot obtain the closed- 
form analytical result for sybil resilience or accuracy and thus 
resort to experiments to evaluate its impact in Section 

The following concepts are needed for the accuracy analysis. 

Definition 1 ((Relative) Error Bound). Let tt denote the true 
WEC vector of non-sybil users and the k-ranked user refer 
to the one with the kth highest WEC score Tk in tt. Let 
denote the WEC score of the k-ranked user after iteration t 
Then — Tk\ is defined as the error bound for the 

k-ranked node after iteration t, and jrk is defined 

as the relative error bound. 

Definition 2 ((Relative) WEC gap). The WEC gap for the k- 
ranked node is defined as A/. = Tk — and = Akjrk 
is the correspondingly relative WEC gap. 

Lemma 1. Let W denote the normalized weighted adjacency 
matrix of the non-sybil region with n users, among which there 
are s seed users. Construct vq for power iteration (see Eq. ^ 
such that the ith element equals 1/s if user i is a seed and 
zero otherwise. Then the relative error bound for the k-ranked 
user satisfies < X^, where A < 1 denotes W’i' second 
largest eigenvalue. 

Lemma [T] states that the rank of each user in iteration t 
approaches its true rank for sufficiently large t. The proof of 
Lemma [2 can be found in Appendix I-B. 

In addition, Ghoshal and Barabasi | [46| recently found that 
if the WEC vector (Pagerank in their paper) follows power 
law distribution, the gap between the kth and (/c -j- l)th WEC 
scores decreases with k. We thus have the following lemma. 

Lemma 2. If the WEC vector tt follows a power-law 

distribution with parameter 7, the relative WEC gap for the 
k-ranked user satisfies A^ 

The proof of Lemma is straightforward according to | [46| 
and omitted here due to space constraints. In Section m we 
show that the WEC vectors for real Twitter datasets indeed 
follow the power-law distribution. We then have the following 
theorem based on Lemma [T] and Lemma [2l 

Theorem 1. Eor iterative credit distribution in a strongly- 
connected weighted directed graph with the monotone- 
decreasing A'^ with k, if < A'^ /2 in iteration t, the ranked 
list of users with top-k credits remain the same in subsequent 
iterations. 

The proof is in Appendix I-C. Theorem indicates that if 
there are no sybil users, Alg. (or TrueTop) can generate the 
true iop-K infiuential non-sybil users if < A'^/2, i.e., when 
t < -log( 2 ii :(7 - l))/log(A) or t = 0(| log(ii:)/log(A)|) 
iterations. This also corresponds to the case of e = 0 with 
100% accuracy. Since the total edge weights from/to the non¬ 
sybil region to/from the sybil region are relatively very small, 
we can expect that the sybil region has little impact on the 
infiuence rankings of non-sybil users. So the accuracy of 
TrueTop under sybil attacks is tightly related to how many 
sybil users can show up in the top-iT list, i.e., the sybil- 
resilience of TrueTop, as analyzed in the following theorem. 


Theorem 2. Let a be the ratio of the total edge weight from the 
non-sybil region to the sybil region over the total edge weights 
in the non-sybil region. Assume that the attacker wants to place 
as many sybils into the top-K list as possible by retaining 
all the credits flowing into the sybil region. The number of 
sybil users in the top-K list after early termination in t = 
0(log(if)/ log(A)) iterations is upper-bounded by K{1 — (1 — 

aY)/{l-aY. 

The proof is in Appendix I-D. Accordingly, we can easily 
derive the lower bound for the accuracy of TrueTop because 
there are at least iT(2 —1/(1 — af) true top-iT non-sybil users 
in the final top-AT list. Note that since a 1 and K is usually 
at the scale of 1,000 and 10,000, this upper bound is far less 
than K, meaning that there are only negligible sybil users in 
the top-AT list. 

VI. Evaluation 

In this section, we thoroughly evaluate the performance of 
TrueTop. We first introduce some implementation details and 
the runtime performance, followed by the datasets used in our 
evaluations. Next, we verify two underlying assumptions in our 
design. Einally, we evaluate the accuracy and sybil resilience 
of TrueTop under various sybil attacks. 

A. Implementation and Runtime Performance 

TrueTop is composed of two main components: the interac¬ 
tion graph construction and the credit distribution with early 
termination. We implemented both with a total of 2000 -f lines 
of mixed code of Python and Specifically, to efficiently 
handle the large-scale interaction networks (millions of nodes 
and billions of edges) in a commodity PC, we adopted the 
Graphchi computing framework | [47| to implement the credit 
distribution of TrueTop. On our desktop with 3.4GHz Intel-i7 
3770 CPU, 16G Memory, a 7200RPM hard disk, and Ubuntu 
12.04 LTS, one single iteration of credit distribution took 
0.3s, 2.5s, 9.2s, and 17.1s for our four datasets in Table |T| 
with 4K, lOK, IM and 2M nodes, respectively. Eor a graph 
with 2M nodes, TrueTop can thus find the top-1000 infiuential 
users after 1,000 iterations within less than five hours on 
a commodity PC. Since TrueTop is expected to be run by 
a service provider with much more powerful computation 
resources, its runtime performance should be acceptable. 

B. Datasets 

We crawled four representative datasets with public Twitter 
APIs. The SF and TS datasets include all the active users who 
have specified San Erancisco Bay Area and Tucson, Arizona 
in the location field of their public profiles in the crawling (or 
target) period, respectively. In addition, the Random dataset 
contains a random set of active Twitter users in the target 
period, and the Music dataset contains the active users who 
have used the keyword “music” in their tweets in the target 
period. Each dataset includes all the user IDs and also their 
time-indexed tweets during the target period, which include 
original tweets, retweets, replies, and mentions. Then we 
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TABLE I: Dataset Characteristics. 



SF 1 

TS 

Random 

Music 

Crawling period 

8/30-11/30, 2013 

6/28-9/28, 2013 

# of users 

176,506 

5,827 

1,999,834 

999,807 

# of edges 

1,493,924 

40,031 

63,803,204 

34,688854 

# of users in GSCC 

104,000(58.9%) 

4,127(70.8%) 

1,541,343 (77.1%) 

687,693 (68.9%) 

# of edges in GSCC 

1,305,834(87.4%) 

36,189 (90.4%) 

55,781,520 (87.4%) 

30,170,774(87.0%) 

# of users in the 2nd largest SCC 

357 

6 

82 

21 




WEC value 

(b) Entropy-based 


Fig. 2: The distribution of WEC values. 


constructed two interaction graphs for each dataset according 
to the process in Section [TV-BI one for sum-based edge weights 
and the other for entropy-based edge weights. 

Table U summarizes the basic statistics of the interaction 
graphs of each dataset, which apply to both sum-based and 
entropy-based edge weights. As we can see, each interaction 
graph has a giant strongly connected component (GSCC) 
which is far larger than the second largest SCC. Since TrueTop 
measures user influence based on incoming interactions, the 
top-if influential users are in the GSCC with overwhelming 
probability. Our subsequent evaluations are thus done on the 
GSCC in each interaction graph only. We obtained very similar 
evaluation results for sum-based and entropy-based interaction 
graphs. Due to space limitations, we report the results for sum- 
based interaction graphs in most cases. 


C. Feasibility Studies 

1) WEC Value Characteristics: TrueTop bases its early 
termination of iterative credit distribution on two assumptions. 
First, the WEC values of non-sybil nodes follow a power-law 
distribution. Second, the relative WEC gap A'^ decreases as k 
increases. Now we verify these two assumptions. 

Fig. shows the log-log CCDF of the WEC values. We can 
see that all the CCDF curves are close to straight lines with 
the slopes from -2 to -1 for the WEC values larger than 10“^. 
Since a power-law distribution with PDF p{x) = (7 — l)x~^ 
has a CCDF F{x) = the WEC values of each interaction 
graph follow a power-law distribution with parameter 7 from 
2 to 3. 

Fig. 1^ shows the log-log scale of as a function of k, 
where the results are shown up to k = 10^ due to space 
constraints. We computed the WEC values by using u = 10“^ 


as the error tolerance threshold of power iterations, which 
led to about 1,000 iterations. A'^ obviously decreases with an 
approximate slope of -1 in the log-log scale, which coincides 
well with the analysis in Lemma 

2 ) Interaction Analysis: Since there is no benchmark for 
the real-world sybils on Twitter, we designed an experiment 
to estimate the total edge weight from the non-sybil region to 
the sybil region in order to verify that it is relatively very 
small. To catch the growing intelligence of Twitter sybils, 
we adopted the behavior of the emerging social bots |[^, 
Hg, p9| . Our experiment run as follows. We first purchased 
1000 Twitter accounts, then divided them to mimic legitimate 
activities as in (H), fl^ , and finally investigated how many 
legitimate users will follow or interact with them. Specifically, 
we divided these 1000 accounts into five groups of equal 
size, each corresponding to a unique activity among following, 
tweeting, retweeting, mentioning, and replying. We ran the 
experiment for 30 days. In each day, we let each sybil user in 
each group initiate 10 activities corresponding to that group. 
For example, each sybil user in the Following group followed 
10 randomly-chosen new users in each dataset every day. 
Except the sybil users in the Tweeting group, the sybil users in 
all the other groups initiated the corresponding activities only 
towards randomly chosen new users in each dataset. We also 
recorded the total followings/mentions/retweets/replies every 
sybil group received each day. In addition, we chose the 
Random, SF, and Music datasets as the target datasets in 
the first 14, middle 8, and last 8 days, respectively. 

Fig. shows the incoming-outgoing (I-O) ratios of each 
sybil group, which is defined as the number of total follow- 
ings/mentions/retweets/replies each sybil group received every 
day over the total number of interactions initialized from the 
sybil group in the same day (i.e., 2,000). We have two obser¬ 
vations. First, non-sybil users are very careful about whom to 
interact with and rarely interact with sybil users. Second, sybil 
users can get a non-trivial number of non-sybil followers. We 
manually found that most non-sybil followers are normal users 
out of reciprocity, social capitalists, or even spam accounts 
not suspended by Twitter, and this observation is in line with 
prior results in p0| , ED So incoming followings are less 
trustworthy for evaluating user infiuence than incoming replies, 
mentions, and retweets. 

To compute the I-O ratios of the sybil and non-sybil com¬ 
munities, we randomly chose 30 groups of 200 users from each 
of Random, SF, and Music datasets. We then recorded the 
incoming and outgoing interactions of each non-sybil group 
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Fig. 3: Relative WEC gap A'^. 
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Fig. 4: Incoming-outgoing ratios for sybil groups, where the same legend is used in all the figures. 


TABLE II: The comparison of incoming-outgoing ratios be¬ 
tween sybil and non-sybil communities under sum-based and 
entropy-based interaction graphs. 


Graph Model 

Community 

SF 

Random 

Music 

Sum 

Non-sybil 

0.89 

1.04 

0.70 

Sybil 

0.08 

0.08 

0.08 

Entropy 

Non-sybil 

1.54 

1.15 

0.60 

Sybil 

0.04 

0.07 

0.05 


every day in the same experimental period. The I-O ratio 
for each sybil or non-sybil group is redefined as the total 
incoming edge weight over the total outgoing edge weight. 
Table |I^ compares the average I-O ratios of the sybil and 
non-sybil groups for both sum-based and entropy-based edge 
weights. As we can see, non-sybil communities always have 
much higher I-O ratios (i.e., much more balanced incoming 
and outgoing interactions) than sybil communities. Moreover, 
the entropy-based weight model yields lower and higher I- 
O ratios than the sum-based weight model for the sybil and 
non-sybil communities, respectively. We thus expect that the 
entropy-based weight model can lead to better sybil resilience 
than the sum-based model (as shown in Table 


for the SF dataset only due to space constraints. 

We modelled the strength of sybil attacks on Twitter by a 
parameter a, which refers to the ratio of the total edge weight 
in the non-sybil region over that from the non-sybil region 
to the sybil region. The default value of a, denoted by a*, is 
obtained from our datasets as follows. Assume that the network 
is composed of a non-sybil region with ni twitterers and a 
sybil region with n 2 twitterers. According to our experiments, 
we found that about 0.98%o of the users in the SF dataset 
have been suspended, so we set ni = 1000n2. Moreover, 
assume that each non-sybil user initiate one interaction (i.e., 
retweeting, mentioning, or replying) to each of the other 
rii — 1 users, leading to ni{ni — 1) outgoing interactions. 
According to Table |I^ the average I-O ratio of the non¬ 
sybil community for the sum-based interaction network is 
(0.89-Fl.04-f0.7)/3 « 0.88. Therefore, the ni non-sybil users 
can receive about 1.88ni(ni — 1) « l.SSnl incoming and 
outgoing interactions. Similarly, the sybil users issue totally 
n 2 ni interactions to the non-sybil region and receive about 
0.0877-2^1 interactions from non-sybil users. We thus have the 
following approximation 


_ 0.0877-277-1 
^ l.SSnj 


4.2 *10“^ 


(4) 


D. Accuracy and Sybil Resilience Studies 

1) Evaluation Methodologies: Since large-scale real experi¬ 
ments on Twitter inevitably violate the Twitter ToS, we resort 
to synthetic simulations to evaluate the accuracy and sybil re¬ 
silience of TrueTop. We used all the four datasets and obtained 
quite consistent results. Below we show the evaluation results 


We used the following method to simulate the sybil region, 
which has been adopted in pS] , p8| . Given the interaction 
graph constructed from the SF dataset, we can expect that the 
majority of the 104,000 users there are non-sybil users, but 
we cannot tell which users are sybil or non-sybil users. So 
we manually attached to the original interaction graph a sybil 
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region which is a complete digraph of 500 sybil users and ran 
TrueTop over this augmented interaction graph. We assume the 
worst-case scenario in which the attacker aims to retain all the 
credits flowing into the sybil region, so there is no interaction 
from the sybil region to the non-sybil region. We then added 
Wg random links of weight one from the non-sybil region to 
the sybil region, which is equivalent to assuming that there 
are Wg accidental one-time interactions from non-sybil users 
to sybil users. Wg varied from 10 to 200 in our experiments. 
Since the total edge weight of the original interaction graph 
is about 10^, we effectively simulated the parameter a from 
10“^ to 2 X 10“^. To simplify the presentation, we equate Wg 
with a and call it the attack strength as well hereafter. 

We considered three strategies for the attacker to add the 
Wg links. In the random attack, the attacker randomly selects 
Wg users in the non-sybil region and adds a link of weight one 
from each to a randomly chosen user in the sybil region. In the 
community attack, the attacker performs a breadth-flrst search 
from a random user in the non-sybil region until Wg users 
are found, and it adds a link from each discovered user to a 
random user in the sybil region. In the seed attack, we flxed 
10 seed users in the non-sybil region and assumed that the 
attacker knows all of them. The attacker performed a breadth- 
flrst search from the 10 seed users and randomly chose Wg 
users closest to any of the 10 seed users. It Anally adds a 
link of weight one from each of them to a random user in 
the sybil region. Obviously, the seed attack corresponds to the 
strongest attack. We conducted 50 experiments for each attack 
and report the average result below. In addition, we chose 100 
verifled users as seed users in all simulations. 

Now we introduce some metrics to measure the accuracy 
and sybil resilience. Recall that Ur, U.r, and U denote the 
TrueTop output, the true top-iT influential users in the non¬ 
sybil region, and all the sybil users, respectively. We obtained 
by running power iteration over the non-sybil region only 
with the error tolerance v = 10“^. We measure the accuracy 
of TrueTop by comparing Ur and via the following two 
types of errors. 

• Type-I error: d{K)lK, where d{K) is the distance 
between Ur and U^ and computed according to Eq. 0- 
The metric measures the average rank offset of U^ from 
Ur. 

• Type-II error: [K — \U'^ ^Ur\). This metric measures 
how many true top-if users are missed by TrueTop. 

The sybil resilience of TrueTop is inversely proportional to 
#sybil = \U{^Ur\. After iterative credit distribution in TrueTop 
terminates, assumes that totally C credits are retained in the 
sybil region. Let Ci,..., Cr denote the credits of the top-iT 
influential users in the non-sybil region in a non-decreasing 
order. Also assume that the attacker tries to maximize #sybii 
by arbitrarily manipulating the topology of the sybil region 
such that the C credits can flow into a few sybil users. We 
can derive #sybii as follows: 


T^sybil 


0 if C<Cr, 

argmax C > xCrj^i-x else. 

l<x<K 
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Fig. 5: TrueTop performance under different attack strengths 
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Fig. 6: TrueTop performance for different ATs. 


2) Basic Results: Fig. shows the performance of TrueTop 
under different attack strengths in random and community 
attacks. In this experiment, we set AT = 100 and e = 0. As 
the attack strength increases from 10 to 200, the type-I error is 
flat with less than one, and the type-II error is below two, both 
showing the high accuracy of TrueTop under different attack 
strengths. Moreover, the number of top-100 sybil users, i.e., 
#sybih slowly increases as Wg increases, which is as expected. 
#sybib however, stays below four for both attacks. In addition, 
larger Wg is likely to increase the number of iterations and 
thus make the top-AT list more accurate. So we can see that 
the type-II error overall decreases with increasing Wg. 

Fig. shows the performance of TrueTop under different 
A"s in random and community attacks. In this experiment, we 
set the Wg = 100 and e = 0. We also normalized #sybii by 
K. Although #sybii/^ slowly increases with K due to more 
iterations, it is always less than 6%. In addition, both type-I 
and type-II errors are always less than two, indicating the high 
accuracy of TrueTop. 

Fig. |7] shows the performance of TrueTop under different 
es in random and community attacks. In this experiment, we 
set Wg = 100 and K = 100. As expected, the larger the 
error tolerance e, the larger both type-I and type-II errors. 
In contrast, #sybii decreases with increasing e due to fewer 
iterations towards credit distribution termination. 

Fig. shows the performance of TrueTop under seed attacks 
for both sum-based and entropy-based edge weights. In this 
experiment, we set AT = 100 and e = 0. In addition, we 
randomly selected Wg users from d = 3,000 immediate 
successors of 10 random seed users, from which Wg links of 
weight one were added to the sybil region. We can have three 
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TABLE III: The impact of different design options on TrueTop performance. 



Random attack 

Community attack 

Seed attack 

Seed selection: basic vs. rwec 

0.11 

0.232 

-0.017 

-0.03 

-0.19 

-0.192 

0.11 

0.19 

0.316 

Edge weights:sum vs. entropy 

0.07 

-0.002 

0.226 

0.08 

0.00 

0.572 

-0.071 

0.327 

0.871 

# of seeds: 10 vs. 100 

4.26 

0.099 

0.122 

2.89 

0.121 

0.078 

3.66 

0.136 
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Fig. 9: Comparing TrueTop with Kred, Pagerank and WEC 
with power iteration under the random and community attacks. 
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Fig. 8: Impact of seed attacks with different weight models. 


observations from Fig.[^ First, TrueTop is still very accurate as 
both type-I and type-II errors are always less than 2. Second, 
seed attacks can yield more sybil users in the top-iT list than 
both random and community attacks. Finally, entropy-based 
edge weights enable stronger sybil resilience than sum-based 
edge weights, as the former can dramatically increase the total 
edge weight in the non-sybil region in contrast to the total 


edge weight from the non-sybil region to the sybil region. An 
effective defense against the seed attack is deferred to Fig. im 

Table |n^ shows the impact of design choices on the TrueTop 
performance. In this set of experiments, we set K = 100, 
€ = 0, Wg from 10 to 200, and d = 3,000 for the seed 
attack. We compared the basic and reverse-WEC methods 
for seed selection, sum-based and entropy-based methods for 
determining edge weights, and also 10 versus 100 seed users. 
For simplicity, we added up the type-I errors, type-II errors, 
and #sybii values under different attack strengths for each 
design choice, respectively. For each pair of design choices, we 
subtracted the sum of the second choice from that of the first 
one for the type-I error, type-II error, and #sybib respectively. 
Since most results in Table lllfi are positive, it is clear that 
the second choice in each pair can achieve higher accuracy 
and sybil resilience in most cases. Specifically, as expected, 
the entropy-based weight model yields better sybil resilience 
performance than the sum-based model. 

3) Comparison with Other Methods: We compare our algo¬ 
rithm with the following methods. 


1) Kred (Tt). Since Kred has publish its influence score 
algorithm on http://kred.com/rules, we select it as 
the benchmark mechanism. Kred only computes the 
infiuence score by how many interactions a user have 
received in the past 1,000 days. During our 90-days 
experiment, we let each of the 500 sybils retweet each 
other sybil once per day. Therefore, each sybil receives 
44,910 interactions from the sybils in the end. We will 
see that this conservative attack is sufficient for filling 
the top-K list with mostly sybils. 

2) Pagerank | [48| . One may think about using the Pagerank 
value of each user in the interaction graph to evaluate 
his infiuence. Modified power iteration with non-zero 
reset probability is commonly used to compute Pager- 
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ank values. We set the rest probability to 0.15. 

3) WEC by power iteration. This method corresponds to 
TrueTop without early termination. 

Fig. [^compares the number of top-100 sybils of TrueTop 
with those of Kred, Pagerank and WEC by power iteration. 
As we can see, TrueTop allows less than 4 sybil users in the 
top-100 list under both random and community attacks. By 
comparison, the sybils in Kred can easily occupy 99 positions 
of the top-100 list. We also expect they will occupy all the 
top-100 positions if more interactions between the sybils were 
conducted. This is because the sybils can obtain unlimited 
incoming interactions from other sybils. Under WEC with 
power iteration, sybil users can occupy a significant portion 
in the top-100 list, as a lot more credits fiow into and stay in 
the sybil region when power iteration terminates in contrast 
to TrueTop. In addition, Pagerank leads to more top-100 sybil 
users than TrueTop and is less sensitive to the attack strength 
than WEC with power iteration. However, if we increase the 
number of sybil users from 500 to 1,000 without changing 
the attack strength, the top-100 sybil users under Pagerank 
will increase. This is because the more sybil users, the higher 
probability that credit distribution jumps to the sybil region 
due to resetting operations, the higher Pagerank values of some 
sybil users. So Pagerank is not sybil-resilient either, which is 
consistent with | [4^ . In contrast, both TrueTop and WEC with 
power iteration are insensitive to the size of the sybil region. 

Since WEC with power iteration is equivalent to seed-based 
iterative credit distribution without early termination, we also 
compare it with TrueTop with regard to the resilience to the 
seed attack. Note that Pagerank is not vulnerable to the seed 
attack because it does not use any seed user. Fig. [T^ compares 
the top-100 sybil users of the two methods under the seed 
attack, where the number of immediate successors of the 10 
victim seed users varies from d = 5000 to 10, 000 for the fixed 
attack strength Wg = 100. As we can see, both methods yield 
more top-100 sybil users as d increases under sum-based and 
also entropy-based edge weights. This result is quite intuitive: 
the smaller d, the fewer nodes sharing the initial credits from 
the seed users, the more credits flowing into the sybil region 
over the Wg links, and vice versa. 

An effective defense again the seed attack is to select 
more seed users and/or choose the verified users with more 
immediate successors as seed users. The efficacy of this 
defense is shown in Fig. im In this experiment, we assume 
that the attacker picked up 10 random seed users and then 
randomly selected d immediate successors of them for adding 
the Wg links to the sybil region. We varied the number of 
seeds from 10 to 800 for each value of d. As we can see, 
we can dramatically improve the resilience of TrueTop to the 
seed attack by increasing both the number of seed users and 
the number of immediate successors of the seed users. 

4) Remarks: We have three remarks on the performance 
evaluation above. First, our evaluation results demonstrate the 
lower-bound performance of TrueTop. Specifically, we adopted 
a very strong attacker model by assuming that the attacker 
withholds all the credits flowing into the sybil region by having 
zero interactions to the non-sybil region. In practice, sybil 
users often try to initiate interactions with non-sybil users for 


other purposes such as spamming and phishing than merely 
aiming to gain high infiuence scores. Therefore, we can expect 
fewer credits to stay in the sybil region than under our attacker 
model such that TrueTop shall have higher accuracy and sybil 
resilience in more practical settings. Second, we admit that our 
evaluations are not complete given so many design choices 
for TrueTop as shown in Table and many possible attack 
strategies. We have only shown some important results here 
as the examples and expect similar results for other design 
choices and attack strategies. Finally, we modelled the sybil 
behavior in accordance with prior work 0, (Tg, (Tg. There 
are more advanced sybil attacks such as astroturfing 
which could attract more legitimate interactions from non¬ 
sybil users. Unfortunately, there is no efficient way to simulate 
such advanced sybil attacks on a large scale. Instead, we use 
high attack strength Wg to model them in the experiment. 
As expected, TrueTop performs worse for higher Wg but still 
shows better performance in contrast to other methods. The 
performance of TrueTop will certainly degrade if the sybils 
could completely mimic the behavior of legitimate users, but 
manipulating the sybils to behave so intelligently will involve 
huge adversarial effort. TrueTop can thus significantly raise the 
bar for attacks on infiuence measurement. 


VII. Conclusion 

Infiuential users are vital to accelerate large-scale informa¬ 
tion dissemination and acquisition on Twitter. In this paper, we 
presented TrueTop, the first sybil-resilient system to measure 
the infiuence of Twitter users to the best of our knowledge. Our 
theoretical studies and also performance evaluations confirmed 
the high accuracy and sybil resilience of TrueTop. 
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Appendix 

A. Proof of Proposition 1 

Proof: Let the total credits in H and S at t-th iteration be 
and C^\ respectively. According to the credit distribution 
defined in Eq. after the t-th iteration, the average credits 
fiowed from PL to S and S to PL are and I3C^\ 
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respectively. Meanwhile, the total credits in the whole network 
is constant to 1. Hence, 
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Since a ^ 1 and /3 ^ 1, will decrease monotonically and 
will increase monotonically. When t ^ oo, 


B. Proof of Lemma 1 

Proof: According to the Perron-Frobenius theory pO} , 
the matrix W is irreducible and has the largest eigenvalue 
of 1 , and all other eigenvalues are absolutely less than 1 , 
denoted as 1 = Ai > A 2 > A 3 > ... > A^ > —1. 
Moreover, if we denote the corresponding n eigenvectors as 
vi, V 2 ,..., v^, then |vi| = 1 and we denote vi as the WEC 
vector TT. Next if W is diagonalizable, then vi, V 2 ,..., can 
be orthogonal to expand the whole space of . For the case 
of non-diagonalizable W, we can use the Jordan canonical 
form to transform it into a diagonalizable one 

Since vi, V 2 , V 3 ,..., are orthogonal, vq can be written 
as 

n 

Vo = ajVj (5) 

i=l 

where G M. We argue that if W is stochastic and irreducible 
then ai = 1. To see why, we first notice that since W is 
stochastic, W1 = 1. It follows that v^Wl = Xivf 1 = vfl. 
The eigenvector corresponding to Ai is the stationary distri¬ 
bution of Markov Chain W. Since W is irreducible, A^ < 1 
and Xi ^ I when i 1. Thus we can see that v^l = 0 for 
i 7 ^ 1. Multiplying 1 at both sides of Eq. it follows that 
vqI = aivil. Since both vq and vi are non-negative vectors 
with the sum of 1 , we have ai = 1 . 

Thus Eq. can be simplified as 

n n 

Vo = Vi + ^ Oi^i = TT + ^ OiVi . 
i=2 i=2 

Multiplying at both sides and keeping using the equa¬ 
tion v^W = A^W, we can obtain 

n n 

= vqW* = (tt + yy aiVi)W* = TT + yy A-a^Vi . 

i=2 i=2 

Let A = max(IA 2 1, |An|). As the t ^ oc, A^ will become 
dominant and it follows that — 7 r)i| = O(A^) 


Moreover, for j G \ s, =0. Hence, 

n n 

Cj ^ = I Xj^OiVijl < X I ^ I ~ ^ ' 

i=2 i=2 


C. Proof of Theorem 1 

Proof: The conclusion is composed of two parts. We begin 
with the first part, i.e., if A^ < A'^/2 at the t-th iteration, then 
xi > X 2 > • •. Xk. Consider the k- and {k — 1)-ranked nodes. 
Since the relative WEC gap A'^ is monotone decreasing for k, 
we have 

e/c < A^ < A'^/2 < A'^_;l/2 

Combined with e'^_ -1 < in Lemma we can get 4-1 < 
A'^_^/2. In other words, 

f ~ \^k '^k\ — ^kl‘^'1 

^k—1 ~ l^/c—1 —1| A/{;_i/2. 

By several operations, we have 


Xk-I -Xk> ((tt/c-i - TTk) - (tt/c - 7r/e+i))/2 > 0 


which holds since A/^/tt/c < /Xk-i/^k-i < ^k-i/^k^ Simi¬ 
larly, we can find that Xk -2 > x^^ • • • ,xi > x^. Moreover, if 
starting from {k — 1)-ranked node (it holds as 4-1 < A* < 
A';._^/2), we have Xk -2 > > ^/c-i and thus 

Xi> X2> • . • > Xk. 

Then we prove the second part, i.e., if A^ < A'^/2 at the t-th 
iteration, then Xk > Xj, where j is from k^l to n. Consider 
the k- and {k + 1)-ranked nodes. We have 

r Cfc+i < A*7rfe+i < AVfe < Afc/2, 

\ < A^TTfe < Afc/2. 


Hence, 


^k-\-l ^k ^ 0 


and so for all other nodes with the rankings larger than k. 

Since A^ is geometrically decreasing for t, A^ < A'^/2 holds 
for all the following iterations and so does the conclusion. ■ 


D. Proof of Theorem 2 
Proof: According to 

(4fe 


, the expected /c-ranked WEC is 

m 


According to Proposition the number of credits for S after 
the t-th iteration is given by: 


= 1 - = (1 - ^)«(i - « - /?)*-' 
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a-^ P 


The maximum can be obtained as 1 — (1 — af when 
P ^ 0, i.e., the sybils conduct very limited interactions to the 
non-sybil users. Moreover, since the attacker wants to place as 
many sybils into the top-AT list as possible, he can just divide 
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the total credits by the /c-ranked WEC value. Then the 
number of sybils that own {'k)k credits is given by 


n{K) 


c‘h{'^)k 


(1 - ay{'K)K 


Here we further approximate the {'k)k- For the power law 
distribution, 2 < 7 < 3. Thus 

r(fc- 7 t) ^ r(fc-i) _ 1 
r(fc) ^ r(A:) “ k 

Hence, we can obtain n{K) < K{1 — {1 — a)^)/(l — aY. ■ 
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